fix(gateway): plugin config from YAML, thread-safety, CI stability#350
Merged
fix(gateway): plugin config from YAML, thread-safety, CI stability#350
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new gateway unit test intended to reproduce/lock in the reported bug where plugin configuration provided via --params-file YAML is not propagated to plugins (because the current implementation reads from NodeOptions::parameter_overrides(), which stays empty in that launch path).
Changes:
- Add
test_plugin_config.cppto simulaterclcpp::init(... --params-file ...)and check plugin config visibility. - Register the new
test_plugin_configgtest target inros2_medkit_gatewayCMake.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/ros2_medkit_gateway/test/test_plugin_config.cpp |
New test attempting to prove the --params-file plugin-config propagation bug. |
src/ros2_medkit_gateway/CMakeLists.txt |
Adds test_plugin_config target and assigns it a unique ROS domain via medkit_set_test_domain. |
54b0859 to
2387af4
Compare
mfaferek93
reviewed
Apr 4, 2026
mfaferek93
reviewed
Apr 4, 2026
mfaferek93
reviewed
Apr 4, 2026
mfaferek93
reviewed
Apr 4, 2026
mfaferek93
reviewed
Apr 4, 2026
e13f49a to
113bdd2
Compare
…t passed to plugins extract_plugin_config() reads from get_node_options().parameter_overrides(), which is always empty for --params-file YAML params. These go into the ROS 2 global rcl context instead. Plugins always receive empty config and fall back to defaults. The test inits rclcpp with --params-file containing plugin config, creates a GatewayNode with default NodeOptions (production path), and asserts that the YAML plugin params are accessible. Currently fails with: "Total overrides: 0" Closes #349
…lobal args extract_plugin_config() read from NodeOptions::parameter_overrides() which is always empty for --params-file YAML params (they go to the rcl global context instead). Plugins received empty config and fell back to defaults. Fix: declare_plugin_params_from_yaml() accesses rcl_arguments_get_param_overrides() from the global context, finds params matching the plugin prefix, and declares them on the node with their typed values. extract_plugin_config() then reads them via list_parameters()/get_parameter(). The NodeOptions::parameter_overrides path is preserved as the primary source for programmatic overrides (used in unit tests).
… test The previous test created a full GatewayNode which hung on Humble/Rolling CI due to heavy DDS/HTTP initialization. Replace with a plain rclcpp::Node that directly tests the declare_plugin_params_from_yaml() logic: 1. Writes YAML with plugin params (string, int, bool, double) 2. Inits rclcpp with --params-file 3. Confirms parameter_overrides() is empty (proves the bug) 4. Calls declare_plugin_params_from_yaml() and verifies all types resolve Test runs in ~50ms instead of timing out.
…header - Move function to param_utils.hpp so test uses the real implementation instead of a diverging copy - Use unique_ptr RAII guard for rcl_params_t instead of manual cleanup - Log param name on declaration failure instead of silent catch(...) - Warn when encountering unsupported array param types
Commit 5224420 removed libcpp-httplib-dev from rosdep skip-keys thinking the vendored cpp-httplib fallback would cover it. That's true at the CMake level, but rosdep runs first and fails on humble (Ubuntu 22.04) because libcpp-httplib-dev is not available in the apt repo. Add it back to skip-keys so rosdep skips the lookup on every distro; the CMake macro medkit_find_cpp_httplib then uses the vendored header in src/ros2_medkit_gateway/src/vendored/cpp_httplib/ everywhere. Single fallback path for humble/jazzy/rolling = consistent builds. Fixes the build-and-push (humble) job failure on main.
Two race conditions in sample_topic() called from httplib threads while the executor spins on the main thread: 1. create_generic_subscription from httplib thread races with the executor iterating callback groups. Fix: dedicated MutuallyExclusiveCallbackGroup for sampling subscriptions isolates them from the default group. A sampling_mutex_ serializes subscription lifecycle so only one sampling subscription is active at a time. 2. Callback lambda captures &message_promise and &received by reference to stack-local variables. On timeout, the function returns and destroys the stack while the executor may still be dispatching the callback. Fix: heap-allocate shared state (SampleState) behind a shared_ptr so the callback keeps state alive regardless of the caller's lifetime. Adds 3 thread-safety tests: - SampleTopicTimeoutDoesNotCrash (20 rapid timeouts) - ConcurrentSampleTopicDoesNotCrash (4 threads x 5 iterations) - RapidTimeoutWithActivePublisher (30 boundary-condition iterations) Manifested as non-deterministic gateway SIGSEGV (exit -11) during test_data_read on rolling CI.
…ublisher When a data trigger was created before its topic was discoverable (late publisher scenario), resolved_topic_name was empty and TriggerManager skipped calling topic_subscriber_->subscribe(). The trigger appeared "active" but had no backing subscription - no SSE events were ever delivered. Add deferred re-resolution: - TriggerManager stores unresolved data triggers at creation time - GatewayNode wires a ResolveTopicFn (cache lookup by entity_id + resource_path suffix matching) - TriggerTopicSubscriber's 5s retry timer calls TriggerManager::retry_unresolved_triggers() - Resolved triggers get their topic_name updated and subscribed Adds 3 unit tests for deferred resolution safety. Fixes test_triggers_late_publisher flaking on humble where DDS discovery is slower than on jazzy/rolling.
Commit f00a156 added callback_mutex_ synchronization to rpm_sensor as an experiment but never followed up with the other demo nodes. On humble, TimerBase::cancel() does not wait for in-flight callbacks; if the destructor resets the publisher while a callback is running, the callback dereferences a null SharedPtr causing SIGSEGV (exit -11). Apply the same pattern to all 7 remaining demo nodes with timer+publisher: - beacon_publisher, brake_actuator, brake_pressure_sensor, door_status_sensor, engine_temp_sensor, lidar_sensor, light_controller Pattern: callback_mutex_ locked in every timer callback (with null-check on publisher) and in the destructor after cancel() before resetting members. lidar_sensor required extracting check_and_report_faults_unlocked() to avoid deadlock when on_parameter_change (holding lock) calls fault check. Fixes recurring humble SIGSEGV in test_entity_routing, test_data_read, test_entity_model_runtime, and any test using demo nodes.
9cd4291 to
4452def
Compare
…inResponse API The PluginContext API was refactored to use PluginRequest/PluginResponse wrapper types instead of raw httplib types. The test's FakePluginContext and method stubs were not updated, breaking the build on all platforms. - Replace send_json/send_error stubs with PluginRequest/PluginResponse stubs - Update validate_entity_for_route override signature - Remove httplib.h include (no longer needed)
… API stubs Two build issues introduced on main by the VDA 5050 merge: 1. medkit_find_cpp_httplib() called without VENDORED_DIR - fails on humble where libcpp-httplib-dev is unavailable. Point to gateway's vendored copy like graph_provider does. 2. Test stubs used old PluginContext API (httplib types). Update to PluginRequest/PluginResponse wrapper types.
…ption creation SingleThreadedExecutor does not synchronize Node::create_generic_subscription() called from external threads (cpp-httplib handlers) with its internal wait set iteration. This causes non-deterministic SIGSEGV on rolling where rclcpp internals are less forgiving of concurrent node manipulation. MultiThreadedExecutor properly synchronizes subscription lifecycle across threads. All gateway callbacks are already protected by mutexes (EntityCache, LogManager, TriggerManager, TriggerTopicSubscriber), so concurrent callback execution is safe. This is the definitive fix for the rolling SIGSEGV in test_data_read. The previous sampling_mutex_ + MutuallyExclusiveCallbackGroup fix reduced the race window but did not eliminate it.
mfaferek93
approved these changes
Apr 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Summary
Originally a fix for
extract_plugin_config()ignoring--params-fileYAML params. Expanded to address multiple thread-safety and CI stability issues discovered during review and CI validation.Plugin config fix:
extract_plugin_config()read fromNodeOptions::parameter_overrides()which is empty for--params-fileparamsdeclare_plugin_params_from_yaml()(shared headerparam_utils.hpp) that reads from global rcl contextThread-safety fixes:
NativeTopicSampler::sample_topic(): dedicated callback group, sampling mutex, heap-allocated callback state (shared_ptr instead of stack references)SingleThreadedExecutortoMultiThreadedExecutor- required because httplib threads callcreate_generic_subscriptionwhich races with executor iterationCI stability fixes:
libcpp-httplib-devto rosdep skip-keys for consistent vendored buildssovd_service_interface: VENDORED_DIR for humble + test stubs updated for PluginRequest/PluginResponse APIIssue
Type
Testing
test_plugin_config- proves YAML plugin params reach the gatewayNativeTopicSamplerThreadSafetyTest- 3 tests: concurrent sampling, rapid timeout, timeout-no-crashTriggerManagerTest- 3 tests: deferred resolution create/retry/safetytest_triggers_late_publisher- integration test for deferred trigger resolutionChecklist